The chloroplast genome sequence and phylogenetic analysis of Apocynum venetum L.

Apocynum venetum L. (Apocynaceae) is valuable for its medicinal compounds and fiber content. Native A. venetum populations are threatened and require protection. Wild A. venetum resources are limited relative to market demand and a poor understanding of the composition of A. venetum at the molecular level. The chloroplast genome contains genetic markers for phylogenetic analysis, genetic diversity evaluation, and molecular identification. In this study, the entire genome of the A. venetum chloroplast was sequenced and analyzed. The A. venetum cp genome is 150,878 bp, with a pair of inverted repeat regions (IRA and IRB). Each inverted repeat region is 25,810 bp, which consist of large (LSC, 81,951 bp) and small (SSC, 17,307 bp) single copy areas. The genome-wide GC content was 38.35%, LSC made up 36.49%, SSC made up 32.41%, and IR made up 43.3%. The A. venetum chloroplast genome encodes 131 genes, including 86 protein-coding genes, eight ribosomal RNA genes, and 37 transfer RNA genes. This study identified the unique characteristics of the A. venetum chloroplast genome, which will help formulate effective conservation and management strategies as well as molecular identification approaches for this important medicinal plant.


Introduction
Apocynum venetum L. (Apocynaceae) (Luobuma in Chinese) is a perennial herb distributed in Eurasia from Southeast Europe to Northern China. It occurs in floodplains and valleys along rivers such as the Tarim River [1,2]. The roots, stems, leaves, and flowers of A. venetum have medicinal uses [3,4] and these uses were documented in the "Compendium of Materia Medica." In 1977, A. venetum was listed in the Pharmacopoeia of the People's Republic of China as a primary treatment for hypertension and hyperlipidemia [5][6][7][8], and pharmacological studies have demonstrated that A. venetum possesses many pharmacological activities including cardiotonic [9], hepatoprotective [10,11], antioxidant [12][13][14], antidepressant and anxiolytic effects [15][16][17][18]. A. venetum maybe useful for the prevention and treatment of cardiovascular and neurological diseases such as high blood pressure, high cholesterol, neurasthenia, depression, and anxiety [19][20][21][22][23]. manufacturer's instructions (NEBNext 1 UltraTM DNA Library Prep Kit for Illumina 1 ). The library was constructed with an Illumina NovaSeq platform (Benagen Tech Solution Co. Ltd., Wuhan, China) and 150-bp paired-end reads were generated. The Illumina PCR adapter reads, low-quality reads and reads containing more than 5% unknown nucleotides "Ns" were filtered from the paired-end raw reads in the quality control step. All good-quality paired clean reads were obtained using SOAPnuke software (version: 1.3.0). The assembled reads were joined into a bidirectional iterative derivation using NOVOPlasty (version:3.13.1, parameter: k-mer = 127) to obtain the whole-genome sequence. The cp-like reads were used to assemble sequences using NOVOPlasty. NOVOPlasty assembled the partial reads and stretched as far as possible until a circular genome was formed. All circled sequences were searched by BLASTN (version: BLAST 2.2.30+, E-value � 1 e-5 ) against the reference database. Sequences with alignment greater than 1,000 bp and coverage greater than 90% were retained. Based on the depth of sequencing, PE reads alignment, and alignment with closely species to A. venetum, the candidate sequences were connected in order to determine whether they formed a loop. When a gap (including N sequence) appeared, Gapcloser (Version: 1.12) was used to fill in the hole to obtain the final splicing result [43]. After filtering the repeated sequences and the sequences with lengths less than 300 bp, 48 sequences with start codons of ATG, TTG, CTG, ATT, ATC, GTG, and ATA and end codons of TGA, TAG, and TAA, were retained to conduct subsequent analysis.

Annotation and analysis of the cpDNA sequences
The cp genome sequence was annotated using the DOGMA program (http://dogma.ccbb. utexas.edu/) [44], and the tRNAscan-SE program was used to predict tRNAs in the genome [45]. The circular maps were drawn by the OGDRAWv1.2 program [46] (http://ogdraw. mpimp-golm.mpg.de/). In order to eliminate the influence of amino acid composition on codon usage, the characteristics of the variations in synonymous codon usage, the relative synonymous codon usage values (RSCU), base composition and codon content were analyzed using MEGA 7.0. Simple sequence repeats (SSRs) in the cp genome were identified using SSRHunter software (http://www.biosoft.net) [47,48]. The parameters were set to five repeat units for mononucleotide SSRs, five repeat units for dinucleotide SSRs, three repeat units for trinucleotide SSRs, and three repeat units each for tetranucleotides, and pentanucleotide SSRs.

Phylogenetic analysis
We downloaded 21 cp genome sequences from the NCBI organelle genome and nucleotide resource database, and used all genomes for phylogenetic analysis. Clustalw2 software (Conway Institute of Biomolecular and Biomedicine, Dublin, Ireland) was used to sequence the genome [50-53]. We used MEGA7.0 to analyze and draw a phylogenetic tree with ML (maximum likelihood). Bootstrap analysis was performed using 1,000 repetitions and TBR branch exchanges [54-56]. We used 1,000 replicates and TBR branch exchanges to complete the bootstrap analysis.

Features of A. venetum cpDNA
The complete cp genome of A. venetum is 150,878 bp in length (GenBank accession number: MT568765) (Fig 1), and includes a pair of inverted repeats (IR) 25,810 bp long, separated by a large single region (LSC) and a small copy region (SSC) of 81,951 bp and 17,307 bp, respectively (Table 1). It is similar to the cp genome of other Apocynaceae species [57] In the A. venetum cp genome, 131 functional genes were predicted, including eight rRNA genes, 37 tRNA genes, and 86 protein-coding genes ( Table 2) Cp genomes in the IR regions include 33 duplicated genes, with approximately 15 tRNA genes (tRNAs), eight rRNA genes (rRNAs), and nine protein-coding genes (PCGs) (Fig 1). The LSC region includes 58 protein-

PLOS ONE
The chloroplast genome sequence of Apocynum venetum L.
coding and 22 tRNA genes, while the SSC region includes one tRNA gene and 11 protein-coding genes.
The tRNA and protein-encoding gene sequences of the A. venetum cp were analyzed, and the codon usage frequency of the cp genome of A. venetum was inferred and summarized. A total of 17,318 codons represent the coding ability of 86 protein-coding genes and tRNA genes of A. venetum (Table 4), of which 1,814 codons code for leucine (10.47%), and 319 codons code for tryptophan (1.84%), which are the most common and least common amino acids in the cp genome of A. venetum, respectively. Codons ending in A and U are very common. Except for trtl-caa, all preferred synonymous codons (RSCU > 1) end in A or U. There are 14 intron-containing genes, including nine protein-coding genes and five tRNA genes (Table 3). Twelve genes (seven protein-coding and five tRNA genes) contain an intron, and two genes (ycf3 and clpP) contain two introns of the intragene region ( Table 3). The size of the intron-

PLOS ONE
The chloroplast genome sequence of Apocynum venetum L.
containing matK gene in the trnK-UUU gene was 2,474 bp. The Rps12 gene is a trans-splicing gene with the 5' end in the LSC region and the 3' end in the IR region.

Comparative analysis of genomic structure
Comparative genome analysis permits the examination of how DNA sequences diverge among related species. The whole cp genome sequence of A. venetum was compared to the sequences of N. attenuata, G. hirsutum, and A. thaliana. The identities of the entire sequence of the four cp genomes were drawn using the annotation mVISTA N. attenuata as a reference (Fig 2). The variation of the LSC and SSC regions were significantly greater than that of the IR regions. Moreover, the coding regions were more conserved than the non-coding regions. The most divergent coding regions of the four cp genomes were rnH-psbA, psbM-petN, trnC-GCA-petN, trnE-UUC-rpoB, trnY-GUA-trnE-UUC, trnV-UAC-ndhC, rbcL-accD, accD-psaI, LSC rpl32-trnL-UAG, and ndhI-ndhG ycf1-rps15 SSC, and the distribution of plastid rRNAs (rrn4.5, rrn5, rrn16, and rrn23) was the most conserved.

Repeat sequence analysis
We studied the type, existence, and distribution of SSR in the cp genome of A. venetum. A total of 273 SSRs were found in A. venetum, most of which were distributed in LSC and SSC, and some in IR. These included 105 single nucleotide SSRs (38.46%), 142 dinucleotide SSRs (50.01%), 10 trinucleotides, 14 tetranucleotides, and two pentanucleotide repeats. The mononucleotide A and T repeat units accounted for the largest portion.

Phylogenetic analysis
The cpDNA gene content is highly conserved in most land plants. We downloaded 21 complete cp genome sequences from the NCBI Organelle Genome Resources database to reveal the phylogenetic location of A. venetum (Fig 3). In this study, we constructed a phylogenetic tree to infer the phylogenetic positions of A. venetum cp genomes. The evolutionary tree was separated into four clusters. The phylogenetic tree showed that Vitis vinfera were clustered on a single terminal branch. Phylogeny analysis showed that Glycine max, Ricinus communis, Populus trichocarpa, Prunus persica, Medicago truncatuta, Capsella rubella, A. thaliana, and

PLOS ONE
The chloroplast genome sequence of Apocynum venetum L.

Discussion
In this study, we assembled, annotated and analyzed the complete cp sequence of A. venetum. We then analyzed its features, GC content, gene structure, and repeat sequences.  [58][59][60][61][62][63]. The content of DNA GC in the IR region is higher than that in other regions (LSC, SSC); this phenomenon is

PLOS ONE
The chloroplast genome sequence of Apocynum venetum L.

PLOS ONE
The chloroplast genome sequence of Apocynum venetum L.
common in other plants [64][65][66]. The relatively high DNA GC content in the IR region was mainly attributed to the rRNA gene and the tRNA gene [67,68]. Cp sequences have been used to compare the genetics of plant species, gene flow between species, and the size of ancestral populations of sister species [69]. Therefore, it is necessary to understand cp differences among species. We observed the order of approximately the same genes and the coding regions in the organization of the cp genome (Fig 2). The cp genome is considered to be highly conservative compared to the non-coding region, and the two infrared regions are less divergent than the LSC and SSC regions. The four cp genomes with the most different coding regions (rnH-psbA, psbM-petN, trnC-GCA-petN, trnE-UUC-rpoB, trnY--GUA-trnE-UUC, trnV-UAC-ndhC, rbcL-accD, accD-psaI, LSC rpl32-trnL-UAG, and ndhI-ndhG ycf1-rps15 SSC) and the four ribosomal RNA genes (rrn4.5, rrn5, rrn16, and rrn23) were the most conserved. Similar results have been observed in other plant cp genomes.

PLOS ONE
The chloroplast genome sequence of Apocynum venetum L.
Cp genomes are highly conserved and contain a large amount of genetic information. The noncoding regions are less conserved than the coding regions [70,71]. The genes trnK-UUU, rps16, trnG-UCC, atpF, rpoC1, trnV-UAC, rps12, rpl2, ndhB, trnI-GAU, trnA-UGC, and ndhA have one intron each, while clpP and ycf3 contain two introns. A trans-splicing event was also observed in the rps12 gene (Table 4). Previous studies have reported that ycf3 is necessary for the stable accumulation of photosystem I complexes [42,72]. Therefore, we believe that the intron gain in ycf3 of A. venetum provides insight into the evolution of photosynthesis. As cp-specific SSRs are inherited from one parent and are mainly formed by the chain mismatch caused by the sliding of polymerase during DNA replication, they are often used in population genetics, species identification, and evolutionary process research on wild plants. In addition, the cp genome sequence is highly conserved, and SSR primers of cp genome can be transferred across species and genera. There were 273 SSRs detected in in the CP genome of A. venetum. Among these SSRs, mono-, di-, tri-, tetra-, and pentanucleotide were detected. The average density of SSRs was 1.809 SSR/kb in A. venetum (A/T as the main component). These cpSSR markers could be used for future studies of the genetic structure, diversity, and differentiation of A. venetum and its related species.
The phylogenetic positions of 21 cp genomes were successfully analyzed with the support of full bootstrap at almost all nodes. A phylogenetic tree was constructed for the data by ML, and V. vinfera was used as an outgroup. In this method, an initial tree is first built using a fast but suboptimal method such as the neighbor-joining method, and its branch lengths are adjusted to maximize the likelihood of the data set for that tree topology under the desired model of evolution. The results show that A. venetum has the closest relationship with L. japonica, N. attenuata, C. annuum, S. tuberosum, and S. lycopersicum.

Conclusion
We analyzed and illustrated the complete cp genome of A. venetum. The cp genome is conservative and similar to other species of Apocynum. These results provide a reference for the complete assembly of the cp genome of Apocynaceae, which may aid future breeding and research efforts. It may also assist in the development of unique Apocynaceae DNA barcodes of Apocynaceae and in determining the evolutionary history of Apocynaceae.